

# **Clock-Less Design Methodology For Digital** System Design

Arun Sankar M S<sup>1</sup>, Vishnu V Gopi<sup>2</sup>, Padmakumar K<sup>3</sup>, Dominic George Joseph<sup>4</sup>

ME Scholar, VLSI & Embedded System, Maharaja Engineering College, Coimbatore, Tamilnadu, India<sup>1</sup>

M Tech Scholar, VLSI & Embedded System, Toc H Institute of Science and Technology, Ernakulam, Kerala, India<sup>2</sup>

Scientist/Engineer, Vikram Sarabhai Space Centre, ISRO, Thiruvananthapuram, Kerala, India<sup>3,4</sup>

Abstract: As design systems have grown in complexity and clock speeds are constantly increasing, several limitations to the conceptual framework of synchronous design have begun to be noticed. Some notable problems due to higher performance demand are difficulty in global distribution of clock, clock skew, high power dissipation, interfacing difficulties and traversing the chip's longest wire in one clock cycle. It is therefore not a surprise that the area of asynchronous circuits and systems, which generally do not suffer from these problems are gaining importance. Here we take into account new research concept which improves digital system implementations, which is basically asynchronous digital design. Asynchronous systems can be realized using clock-less chip implementation techniques which avoids the clock. This system gives importance to the arrival of data and sequence, only when required, thus reducing power consumption, EMI etc. The proposed methodology ensures the validity of the data by taking care of glitches, delays and hazards. The design of a new methodology for asynchronous system development is discussed in this paper.

Keywords: VLSI, Clock-less system, hazard, Asynchronous design, skew, data completion, Threshold gate, NCL.

### I. **INTRODUCTION**

Functional improvement and performance balancing are  $T_{ot2}$ crucial to successful microprocessor designs. According to Consider figure 2. Assume that each clock edge arrives at Moore's law, by 2016 CMOS clock frequency will be each register at a precise time (tp). The main sources of around 28.7 GHz. Rapid developments in VLSI delays in the clock distribution networks are RC wire technology caused smaller circuits and increased speed of communication. Some of the techniques to improve the performance and the functioning of microprocessor design are pipelining, multi-threading and clock-less design [1].

Traditional digital system designs in synchronous domain use pipelining and multithreading to increase throughput. Higher clock frequencies degrade the performance by way of clock-skew, glitches, meta-stability, hazards etc [3]. The clock-skew results in the violation of setup and hold time. Consider a synchronous pipeline system, typically modelled by a register followed by a combinatorial logic as shown in figure 1. Minimum computation cycle time (T<sub>c(min)</sub>) depends on the register delay time, combinatorial path delay, set up and hold time.



Fig 1: Block diagram of synchronous pipeline system

 $T_{c(min)} = T_{ot1} + T_{lb} + T_{st} + T_{ot2}$ 

: Minimum cycle time  $T_{c(min)}$ 

: Clock to ot1 output delay of the register 1 T<sub>ot1</sub>

: Total worst case delay of the logic block T<sub>lb</sub>

T<sub>st</sub> : Setup time of register 2

: Clock to ot2 output delay of register 2 delays, LC ringing on the clock nets and buffer delays.



Fig 2: Skew problems in synchronous domain

Register-1, which is clocked at Tcl1, passes the data to the logic block and after the completion of operation the data is passed to the register-2. At that time register-2 passes the previous data to the output terminal. But the wire delays in the clock line results in the latching of incorrect data at register 2. In synchronous combinatorial blocks that have difference in computation times, the computation time or delay of the operation depends on the largest delay path in the combinatorial blocks. Consider three combinatorial blocks with delays in the order of 10s, 20s and 30s, so in the synchronous domain that will work only in 30s delay, which means that Synchronous design styles are in worse case delay type [1][2].

Clock signal is generated by oscillator. Higher rate of clock increases the power consumption and causes EMI problems for the design. Clock tree distribution overcomes



INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ELECTRICAL. ELECTRONICS. INSTRUMENTATION AND CONTROL ENGINEERING Vol. 3. Issue 4. April 2015

that are present in the global distribution of clock signal. The complex oscillator circuitry, the clock distribution tree It only considers the gate and wire delays. Therefore, after and multithreading technique can occupy large logic area the completion of the logic operation at each block, the and power consumption [3].

In the microprocessor architectures such as, super scalar and super pipelined machines, speed of operation increases by way of increase in the complexity of the logic blocks thereby exponentially loading the clock network [9]. In this technique the throughput is improved but at the insensitive to gate and wire delays. This type of design is expense of complicated clock tree design techniques to be incorporated at the physical design stage.

Proposed digital system design is clock-less/asynchronous. This design style eliminates the clock distribution. The interconnected wires are negligible compared with the gate absence of clock eliminates the effect of skew and glitches but extra care has to be taken for data transfer control to avoid hazards and to ensure data validity in clock-less circuits. This design is an event-driven mechanism, which B. Basic principle of clock-less systems means that a logic block is processed when data arrives at Without central clock, in a clock-less system, they the input of that logic block. Without using global clock, it is difficult to achieve synchronization. The output of the logic block is sensed to notify the completion of operation. Therefore, clock-less circuits exploit the advantage of average case delay resulting in high speed, robustness, modularity and inbuilt flow control over conventional synchronous design [1].

This paper proposes a new methodology for clock-less digital system design. The present clock-less design methodologies (like Globally Asynchronous Locally Synchronous (GALS) & Bundled Data Transfer) does not fully solve the problems of conventional synchronous designs. In order to overcome these drawbacks, we propose a new design methodology of full asynchronous digital system design.

### CLOCK-LESS CIRCUIT DESIGN II.

In the clock-less design style, logical blocks are not waiting for global clock events; the next block starts the operation when the previous block completes its operation [3]. General rule used in the clock-less system is that data transfer occurs when the next block returns to idle state and the previous block completes an operation and produces a new value to the input of the succeeding block. This is the basic idea of clock-less systems [4][5].

## A. Classification of Clock-less systems

The clock-less circuit design is based on delays in gates and wires within the circuits and the mode of operation. On the basis of delays, clock-less circuits are classified as bounded and unbounded delay systems. In the bounded delay modelling a predefined amount of delay is assigned to the wire. This wire is connected to the next logic block. c. After that assigned delay, the data arrives at the next logic Data validity is achieved by encoding schemes which are block. This is a simple technique to convert the classified as bundle data and dual rail. The bundle data synchronous system to clock-less system. The added delay scheme has request and acknowledgment signals along in the wire is equal to the delay in the synchronous with a matching delay for a data path. This scheme uses domain, but it is difficult to compute the delay value. The single rail data encoding, where data error correction is selection of the delay model for all the process corners is difficult. Hence data validity is not assured and there are

problems in synchronous circuit such as skew and glitches very difficult for this type of modelling. The unbounded delay modelling style does not add any predefined delay. data is transferred to the next logic block. The unbounded delay modelling style is widely used.

> The unbounded delay systems are further classified into, Delay Insensitive (DI), speed independent (SI), and Quasi Delay Insensitive (QDI) systems. DI systems are not practically feasible. SI system considers only the gate delays and neglects wire delays in the circuits. QDI technique uses the concept of iso chronic fork. The assumption is that signal propagation delays in the delays. Only Quasi delay insensitive design is considered for practical implementation [5].

communicate with request (req) and acknowledgement (ack) signals. If a logic block completes its operation, it raises the request signal. It means that the current block is ready to pass the data to the next stage. The data is transferred only when acknowledgement signal is generated from the next block. The control circuitry in the clock-less system decides the data transfer in the system, as shown in figure 3.



C. Basic components of clock-less systems Main components that are used for clock-less design are latches, combinatorial logics, encoders, protocol handlers, local coordination circuits, completion detection circuits and hazard reducing circuits [3].

### Latches

a.

Latches are the state holding elements in the clock-less system. The control circuitry input to latch determines the transfer of data to succeeding stages.

#### *Combinatorial logic* b.

Combinatorial logic circuits are placed in between the latches. They determine the logic function to be implemented. Complexity of the module is reduced by subdividing the logic functions among a number of blocks.

### Encoders



chances of hazards and glitches. In dual rail scheme, each data. The logic block issues data and sets request signal to data bit is split into two signals, true and false. It has only acknowledgment signal as request is encoded with data acknowledgement signal that is set high when it completes signal. The data error correction is possible in this scheme. the operation. Then the previous block sets the request Figure 4(a) shows bundle data system while figure 4(b) shows the dual rail system where data transmission is in next block responds by making acknowledgement low. At the form of data - empty - data cycle. Therefore each data transmission is initiated upon arrival of an empty state. Table 1 shows data encoding of this scheme. This scheme increases the hardware by augmenting the number of wires to the order of 2N; N is the number of data bits in the system.







Fig 4: Encoding Schemes. (a) Bundle data (b) Dual rail

| Data-True | Data-False | Logic value        |
|-----------|------------|--------------------|
| 0         | 0          | Spacer (Null)      |
| 0         | 1          | 0                  |
| 1         | 0          | 1                  |
| 1         | 1          | Not used (Invalid) |

Table 1.Dual rail data representation

### d. Protocol Handlers

Without a central clock, synchronization is achieved by controlling transfer of data across channels using some form of handshaking. Consider figure 5. Every transfer of data from a logic block to succeeding one is initiated by a request and acknowledgement signal. One signal is used for data validity and the other one is used for data acceptance [5]. Signalling protocols are classified into four phase and two phase protocols. In four phase signalling protocol, it takes four communication actions to pass each

succeeding high.The block responds with signal to low, at that point data is no longer valid, and then this point previous block initiates the next data transfer cycle. The two-phase signalling protocol responds on both edges of request and acknowledgment signal. Continuous data transfer may cause data collisions and hence data validity problems are more likely to occur in two phase signalling protocol. In four phase signalling protocol, after each data transfer both request and acknowledgement signals are asserted low thereby avoiding data collisions [6].



Fig 5: Four phase and two phase handshaking

# Local co-ordination circuit

e.

The hazards and race problems in the clock-less digital systems are to be eliminated for ensuring validity of signalling events and meaningful data transfer. The basic and simplest local-coordination circuit element of clockless system is Muller-C element. The acknowledgement signal is the main reference signal of a local co-ordination circuit. Muller-C element is a type of state holding circuit in clock-less systems similar to a set-reset latch in synchronous systems.



Fig 6: Muller C element

| Input<br>A | Input<br>B | Output<br>Z      |
|------------|------------|------------------|
| 0          | 0          | 0                |
| 0          | 1          | $Z_{n-1}$        |
| 1          | 0          | Z <sub>n-1</sub> |
| 1          | 1          | 1                |

Table 2: Muller-C element state table



INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ELECTRICAL. ELECTRONICS. INSTRUMENTATION AND CONTROL ENGINEERING Vol. 3. Issue 4. April 2015

### Completion detection circuit

Necessity of the completion detection circuit in the clockless system is to indicate completion of operation in logic blocks. Normally completion detection circuits are placed at the output stage of each logic block. Data validity is crucial in complex systems, so completion detection may also be placed at the input of each logic block. Completion C. Threshold gates based null convention logic detection circuit can be built up with Null Convention Logic (NCL) [6]. NCLs are a type of complex gates which express the processes completely in terms of the logic itself. They are theoretically complete and economically feasible approach for delay insensitive circuits (DI). But in another logic value called NULL [6]. This is same as that digital logic, there is no state representation for null data of dual rail encoding and it also symbolically represents state. Representation of data by dual valued logic gives the completion of data by itself. The NCL gate will space for representation of null state. This type of data represent a gate with the hysteresis behaviour. This representation reduces hazards and glitches in the clock- hysteresis behaviour may be provided by feedback internal less system. The dual valued logic uses dual rail encoding to the gate or by some inherent behaviour of the gate scheme. If any of the inputs of the complex gate has not implementation approach. made a transition from null to valid data, then the output remains in null state. In this scheme, logic "10" is represented as true value logic and logic "01" is represented as false value logic. NCL gates are discrete threshold gates whose property depends on the number of data values present in the input. The number of inputs to the threshold gate should meet its threshold to turn the gate ON. Figure 7 shows the threshold gate with threshold 3 and 5 number of inputs. If any of the 3 inputs to the gate becomes logic '1', output is set to logic '1' [7].



Fig 7: Threshold gate (TH35w0)

### Hazard reducing circuit g.

Hazards are unwanted data transitions that occur in output before it settles to a predicted value. This can be taken care by null convention logic with feedback. It is a straight forward and inexpensive approach. The output is fed back to the input with weight one less than the threshold. So the unwanted data transition is neglected in the clock-less system.

### III. **PROPOSED METHODOLOGY**

Main components of proposed methodology are,

# A. Four phase protocol

In clock-less system, synchronization is done by coordinating and ordering the flow of data. In 4-phase system, signalling protocol belongs to Return to Zero (RZ) type. So during each transmission, control signals reach to zero value before the next transmission starts. Therefore 4phase system reduces the hazards and glitches.

## B. Dual rail encoding

Data validity issues can be solved by using dual rail encoding scheme. In this encoding, request is encoded in

Completion detection is an important and essential component of a clock-less system. In the digital system data values are represented by '0' or '1', there is no logic value for absence of data. This is represented by using



In general, an NCL gate is denoted as TH mnw where m is the threshold value, n is the total number of inputs, and w is the weight of the inputs. Figure 8 is a TH22w0 NCL gate with hysteresis and its simulation result is shown in figure 9.



Fig 9: TH22w0 simulation result

Summary of proposed methodology,

| Delay modelling style | Unbounded delay         |
|-----------------------|-------------------------|
| Туре                  | Quasi delay insensitive |
| Encoding scheme       | Dual rail               |
| Data ordering         | 4 phase (RZ) type       |
| mechanism             |                         |
| Completion detection  | Threshold gate          |
| Hazard elimination    | NCL                     |
|                       |                         |

Fig 10: Summary of Proposed methodology

### IV. **ADVANTAGES**

Asynchronous design offers the most help to chip designs in which slow actions occur frequently. Asynchronous design reduces the power consumption of chips. In asynchronous systems, idle parts of the chip consume negligible power. This feature is particularly valuable for battery powered equipment. This also reduces the cost of larger systems by avoiding the need for cooling systems. Asynchronous systems produce less radio interference than synchronous machines. Yet another benefit of asynchronous design is to build bridges between clocked



INTERNATIONAL JOURNAL OF INNOVATIVE RESEARCH IN ELECTRICAL, ELECTRONICS, INSTRUMENTATION AND CONTROL ENGINEERING Vol. 3, Issue 4, April 2015

computers running at different speeds. Moreover, replacing any part with a faster version will improve the speed of the entire system. In contrast, increasing the speed of a clocked system usually requires upgrading every part [2][1].

Nevertheless synchronous designs are widely used because of the ease of availability of EDA tools and are widely taught.

### V. CHALLENGES

The clock-less system development is very difficult and complex due to hazard elimination and synchronization. Proper hazard elimination causes large area overhead. In asynchronous design, complex timing analysis makes it difficult to estimate the performance. The system cannot be tested easily. Lack of CAD tools, immature synthesis methodologies and the lack of designer expertise are reasons for clock-less systems not to be widely used.

### VI. CONCLUSION

Clock-less digital systems are advantageous over synchronous digital systems. EMI, skew and power consumption are reduced in clock-less systems. But they generally suffer from hazards, data validity issues, high power consumption by the internal sub systems and design complexity. The proposed clock-less system overcomes all these problems.

### REFERENCES

- Davis. A, Nowick.S.M"An Introduction to Asynchronous Circuit Design", 1997
- [2] Ad Peeters "Hand shake solution" Asynchronous silicon compilation symposium 20 Years OOTI, March 26, 2009.
- [3] Adam Megacz, "Clock less Circuits, Berkeley", EECS, CS150,2009.
- [4] Jens sparso (Technical university of Denmark) &Steve fubber (the University of Manchester), UK "principles of Asynchronous Circuit design- A design perspective" Kluwer academic publishers Boston/ Dordrecht/London, September 2001.
- [5] Krzysztof (kris) Lniewski editor, "Cmos processors & Memories " Chapter -3.
- [6] Karl m fant, Scott a Brandt "Theseus logic Null convention logic", Theseus logic.inc 1997.
- [7] Lvan E Sutherland and Jo Ebergen "computers without clocks" Scientific American Augest-2002.
- [8] Marcrenaudin, Bachar EI Hassan, and AlianGuyot"A new Asynchronous pipeline scheme: application to the design of a self-Timed ring Divider" IEEE journal of solid state circuits vol-31 1996.
- [9] Scott C Smith & Jai di," designing asynchronous circuit using Null Convention logic", synthesis lectures on digital circuits'& systems, Mitchell a Thornton, servo editor, Morgan & clay pool publishers.

### BIOGRAPHIES



**Arun Sankar M S** was born in Kerala, India in 1979. He has graduated in B.E. in Electronics and Communication Engineering from National Institute of Technology, Surat during 2001. He had served various colleges in India and abroad

for a period of 11 years He is currently pursuing his Masters degree in VLSI & Embedded systems from Anna University, Chennai. His areas of interest are digital system design and ASIC design.



**Vishnu V Gopi** was born in Kerala, India in 1989. He has graduated in B-Tech in Electronics and Communication Engineering from M G University during 2012 and got Diploma in Electronics from board of Technical education, Kerala during

2008. He is currently pursuing his Masters degree in VLSI & Embedded systems from Cochin University of Science and Technology, Kerala. His areas of interest are digital system design and analog circuits.



**Padmakumar K** was born in Kerala, India in 1973. He has graduated in B.Tech. in Electronics and Communication Engineering from NSS College of Engineering, Palakkad during 1995. He is currently serving Government of India as Scientist/Engineer in Vikram

Sarabhai Space Centre, ISRO, Thiruvananthapuram. He is one of the main architects of the hardware design of Onboard Computer for ISRO launch vehicles. His areas of interest are Asynchronous Digital Design and ASIC design.



**Dominic George Joseph** was born in Kerala, India in 1984. He has graduated in B.Tech in Electronics and Communication Engineering from University College of Engineering, Thodupuzha during 2006. He is currently serving Government of India

as Scientist/Engineer in Vikram Sarabhai Space Centre, ISRO, Thiruvananthapuram. His areas of interest include Asynchronous Digital Design and Analog ASIC Design.